Anaylze and Visualize Chicago Uber/Lyft Trips

Case Study Cover Image

💎 Case overview

There have been four Transportation Network Providers (often called rideshare companies 🚗) licensed to operate in Chicago. These rideshare companies are required to routinely report vehicles, drivers, and trips information to the City of Chicago, which are published to the Chicago Data Portal. The latest trips dataset can be downloaded at this page.

🐢 Your dataset

⚔️ Your goal

Analyze the dataset and answer the following questions:


▶️ Run the code cell below to import unittest, a module used for 🧭 Check Your Work sections and the autograder.

🎯 Enter your NetID

🧭 Check Your NetID

If the code cell below doesn't throw an error, you're ready to begin this assignment.


🔨 Import packages and dataset

▶️ Run the code below to ensure you're using the correct version of plotly.

▶️ Run the code cell below to import packages used in the case.

🧭 Check Plotly Version

Run the code below to ensure that your notebook uses the same Plotly version as the autograder.

▶️ Run the code below to import and process the trips dataset.


📐 Part 1: Data overview


🎯 Deliverable 1: First 5 rows

👇 Tasks

🚀 Hint

my_dataframe.head()

🧭 Check your work


🎯 Deliverable 2: Summary of DataFrame

👇 Tasks

🚀 Hint

my_dataframe.info()

🎯 Deliverable 3: Number of rows and columns in the dataset

👇 Tasks

🧭 Check Your Work


🗓️ Part 2: Extract datetime values into separate columns

The start column contains trip start timestamps. In this part of the case study, you will extract year, month, day of the month, day of the week, hour, and weekday/weekend information into separate columns.


🎯 Deliverable 4: Extract year into a new column

▶️ Run the code below to print the first 3 values of the start column and the data types.

👇 Tasks

🔥 Solution

Code

👆 date_series.dt is an accessor object datetimelike Series values. You can refer to the documentation here.

🧭 Check Your Work


🎯 Deliverable 5: Extract month, day of month, day of week, and hour into columns

👇 Tasks

🧭 Check Your Work


🎯 Deliverable 6: Create weekday_weekend column

👇 Tasks

🚀 Hint

There are many ways to achieve this task.

The code below creates a new column named cheap_expensive where the value will be string 'cheap' if the price is less than or equal to 10 and 'expensive' if otherwise.

my_dataframe['cheap_expensive'] = np.where(my_dataframe['price'] <= 10, 'cheap', 'expensive')

🧭 Check Your Work


😷 Part 3: Visualize the effects of COVID-19 on the volume of ridesharing trips

Although the first case of COVID-19 was reported in January 2020 in the United States, people started to take it seriously in March 2020.

How did COVID-19 affect the volume of ridesharing trips? 💨 Let's visualize and compare the monthly number of trips for both 2019 and 2020.


🎯 Deliverable 7: Total number of trips in 2019 and 2020

👇 Tasks

🚀 Hint

For num_2019_trips, retrieve the number of rows where df['year'] is 2019.

🧭 Check Your Work


🎯 Deliverable 8: Monthly number of trips and tip percentage

👇 Tasks

Code

🧭 Check Your Work


🎯 Deliverable 9: Sunburst Chart of Monthly Trips

👇 Tasks

🔑 Sample Output

Sample Output

🚀 Hint

Replace my_dataframe, 'column1', 'column2', 'column3', and ...s with your own values from the code below.

fig = px.sunburst(
    my_dataframe,
    path=['column1', 'column2'],
    values='column3',
    title='Your Title Here',
    width=...,
    height=...
)
fig.show()

🧭 Check Your Work


🎯 Deliverable 10: Monthly number of trips in 2019 and 2020 (facet grid bar chart)

👇 Tasks

🔑 Sample Output

Sample Output

🚀 Hint

Replace my_dataframe, 'column1', 'column2', 'column3', and ...s with your own values from the code below.

fig = px.bar(
    my_dataframe,
    title='Your Title Here',
    x='column1',
    y='column2',
    facet_col='column3',
    width=...,
    height=...,
    template='plotly_dark',
    color='num_trips',
    color_continuous_scale=['White', 'Yellow']
)
fig.show()

🧭 Check Your Work


🎯 Deliverable 11: Monthly number of trips in 2019 and 2020 (line plot)

👇 Tasks

🔑 Sample Output

Sample Output

🚀 Hint

Replace my_dataframe, 'column1', 'column2', 'column3', and ...s with your own values from the code below.

fig = px.line(
    my_dataframe,
    title='Your Title Here',
    x='column1',
    y='column2',
    color='column3',
    template='plotly_dark',
    width=...,
    height=...
)
fig.show()

🧭 Check Your Work


📌 Interpreting the pre-covid vs post-covid monthly number of trips


💵 Part 4: Visualize the effect of COVID-19 on tips

Did passengers tip more on average during the pandemic since they appreciated the drivers providing services during risky times?

Or did the passengers tip less on average since the pandemic has devastated the nation's economy in 2020?


🎯 Deliverable 12: Monthly average tip percentages in 2019 and 2020 (facet grid bar chart)

👇 Tasks

🔑 Sample Output

Sample Output

🚀 Hint

Replace my_dataframe, 'column1', 'column2', 'column3', and ...s with your own values from the code below.

fig = px.bar(
    my_dataframe,
    title='Your Title Here',
    x='column1',
    y='column2',
    facet_col='column3',
    width=...,
    height=...,
    template='plotly_dark',
    color='tip_pct',
    color_continuous_scale=['White', 'GreenYellow']
)
fig.update_layout(yaxis_tickformat='%')
fig.update_layout(yaxis2_tickformat='%')
fig.show()

🧭 Check Your Work


🎯 Deliverable 13: Monthly average tip percentages in 2019 and 2020

👇 Tasks

🔑 Sample Output

Sample Output

🚀 Hint

Replace my_dataframe, 'column1', 'column2', 'column3', and ...s with your own values from the code below.

fig = px.line(
    my_dataframe,
    title='Your Title Here',
    x='column1',
    y='column2',
    color='column3',
    template='plotly_dark',
    width=...,
    height=...
)
fig.update_layout(yaxis_tickformat='%')
fig.show()

🧭 Check Your Work


📌 Interpreting the pre-covid vs post-covid average percentage of tips


✨ Part 5: Trips right before July 4th fireworks

Every July 4th, massive crowds gather around places (or boats) to view the fireworks. How did July 4th become the "national fireworks day"? July 4, 1776 is considered to be the birth of United States of Amercia as an independent nation. The Continental Congress approved the final wording of the Declaration of Independence on July 4, 1776. The first-ever recorded July 4th fireworks celebration was held in Philadelphia on July 4, 1777. Since then, there hasn't been an Independence Day without a firework. 💥💥

[Source 1] [Source 2]

In this part, you will find trips started on July 4th between 5-6 PM and create different scatter plots based on those trips.


🎯 Deliverable 14: Filter July 4th 5-6 PM trips

👇 Tasks

🚀 Hint

The code below filters rows where column is 10, column2 is 20, and column3 is 30. The filtered DataFrame is stored to a new variable named my_filtered.

my_filtered = my_df[(my_df['column1'] == 10) & (my_df['column2'] == 20) & (my_df['column3'] == 30)]

🧭 Check Your Work


🎯 Deliverable 15: July 4th 5-6 PM trips scatter plots

👇 Tasks

🔑 Sample Output

Sample Output

🚀 Hint

Replace my_dataframe, 'column1', 'column2', 'column3', 'column4', 'column5', and ...s with your own values from the code below.

fig = px.scatter(
    my_dataframe,
    title='Your Title Here',
    x='column1',
    y='column2',
    size='column3',
    facet_col='column4',
    color='column5',
    template='plotly_dark',
    width=...,
    height=...,
)
fig.show()

🧭 Check Your Work


📌 Interpreting the pre-covid vs post-covid scatter plots


🎯 Deliverable 16: July 4th 5-6 PM trips 3D scatter plots (Pre-COVID)

👇 Tasks

🔑 Sample Output

Sample Output

🚀 Hint

Replace 'column1', 'column2', 'column3', 'column4', and ...s with your own values from the code below.

fig = px.scatter_3d(
    df_july_fourth[df_july_fourth['year'] == 2019],
    title='Your Title Here',
    x='column1',
    y='column2',
    z='column3',
    color='column4',
    template='plotly_dark',
    width=...,
    height=...
)
fig.show()

🧭 Check Your Work

▶️ Run the code cell below to repeat the previous deliverable with trips in 2020.


📌 Interpreting the pre-covid vs post-covid 3D scatter plots


🏡 Part 6: Pickup/dropff area analysis

In this part, you will find the top 20 pickup areas and analyze the trips originating from those areas.


🎯 Deliverable 17: Find top 20 pickup areas by number of trips

👇 Tasks

🔥 Solution

Code

🧭 Check your work


🎯 Deliverable 18: Filter trips from top 20 pickup areas

👇 Tasks

🚀 Hint

filtered_dataframe = my_dataframe[(my_dataframe['column1'].isin(my_top_10_list))]

🧭 Check your work


🎯 Deliverable 19: Number of trips and average trip total by pickup_area

👇 Tasks

🔥 Solution

Code

🧭 Check your work


🎯 Deliverable 20: Number of trips by pickup area (bar chart)

👇 Tasks

🔑 Sample Output

Sample Output

🚀 Hint

Replace my_dataframe, 'column1', 'column2', and ...s with your own values from the code below.

fig = px.bar(
    my_dataframe,
    title='Your Title Here',
    x='column1',
    y='column2',
    color='column1',
    color_continuous_scale='emrld',
    text='column1',
    template='plotly_white',
    height=...
)
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_yaxes(categoryorder='total ascending')
fig.show()

🧭 Check Your Work


🎯 Deliverable 21: Average trip total by pickup area (bar chart)

👇 Tasks

🔑 Sample Output

Sample Output

🚀 Hint

Replace my_dataframe, 'column1', 'column2', and ...s with your own values from the code below.

fig = px.bar(
    my_dataframe,
    title='Your Title Here',
    x='column1',
    y='column2',
    text='column1',
    template='plotly_white',
    height=...
)
fig.update_traces(texttemplate='$%{text:.1f}', textposition='outside')
fig.update_yaxes(categoryorder='total ascending')
fig.show()

🧭 Check Your Work


🎯 Deliverable 22: Pickup area breakdown (treemap)

👇 Tasks

🔑 Sample Output

Sample Output

🚀 Hint

Replace my_dataframe, 'column1', and 'column2' with your own values from the code below.

fig = px.treemap(
    my_dataframe,
    title='Pickup Area Breakdown',
    path=['column1'],
    values='column2',
    height=600
)
fig.show()

🧭 Check Your Work


📌 Interpreting the pickup area visualizations


🦄 Part 7: Pickup area + weekday/weekend

In this part, you will add a new dimension (weekday/weekend) to the top 20 pickup areas.

▶️ Run the code cell below to create the number of trips and average trip totals by pickup area and weekday/weekend classification.


🎯 Deliverable 23: Pickup area + weekday/weekend breakdown (sunburst)

👇 Tasks

🔥 Solution

Code


🎯 Deliverable 24: Pickup area + weekday/weekend breakdown (treemap)

👇 Tasks

🔥 Solution

Code


🍸 Submitting your notebook

There is one final step before exporting the notebook as an .ipynb file for submission. You should restart your runtime (kernal) and run all cells from the beginning to ensure that your notebook is structured properly.

Go to the "Runtime" ("Kernel" if you're on Jupyter Lab) menu on top. Select "Restart and run all". Failing to pass this step may result in significant loss of points since the autograder will fail to run.

image.png